NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations

Xiong, Guojun; Wang, Shufan; Jiang, Daniel; Li, Jian (April 2025, The Thirteenth International Conference on Learning Representations (ICLR 2025))

Federated reinforcement learning (FedRL) enables multiple agents to collaboratively learn a policy without needing to share the local trajectories collected during agent-environment interactions. However, in practice, the environments faced by different agents are often heterogeneous, but since existing FedRL algorithms learn a single policy across all agents, this may lead to poor performance. In this paper, we introduce a \emph{personalized} FedRL framework (PFedRL) by taking advantage of possibly shared common structure among agents in heterogeneous environments. Specifically, we develop a class of PFedRL algorithms named PFedRL-Rep that learns (1) a shared feature representation collaboratively among all agents, and (2) an agent-specific weight vector personalized to its local environment. We analyze the convergence of PFedTD-Rep, a particular instance of the framework with temporal difference (TD) learning and linear representations. To the best of our knowledge, we are the first to prove a linear convergence speedup with respect to the number of agents in the PFedRL setting. To achieve this, we show that PFedTD-Rep is an example of federated two-timescale stochastic approximation with Markovian noise. Experimental results demonstrate that PFedTD-Rep, along with an extension to the control setting based on deep Q-networks (DQN), not only improve learning in heterogeneous settings, but also provide better generalization to new environments.
more » « less
Free, publicly-accessible full text available April 24, 2026
DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback

Xiong, Guojun; Dinesha, Ujwal; Mukherjee, Debajoy; Li, Jian; Shakkottai, Srinivas (April 2025, The Thirteenth International Conference on Learning Representations (ICLR 2025))

Restless multi-armed bandits (RMAB) has been widely used to model constrained sequential decision making problems, where the state of each restless arm evolves according to a Markov chain and each state transition generates a scalar reward. However, the success of RMAB crucially relies on the availability and quality of reward signals. Unfortunately, specifying an exact reward function in practice can be challenging and even infeasible. In this paper, we introduce Pref-RMAB, a new RMAB model in the presence of preference signals, where the decision maker only observes pairwise preference feedback rather than scalar reward from the activated arms at each decision epoch. Preference feedback, however, arguably contains less information than the scalar reward, which makes Pref-RMAB seemingly more difficult. To address this challenge, we present a direct online preference learning (DOPL) algorithm for Pref-RMAB to efficiently explore the unknown environments, adaptively collect preference data in an online manner, and directly leverage the preference feedback for decision-makings. We prove that DOPL yields a sublinear regret. To our best knowledge, this is the first algorithm to ensure $$\tilde{\mathcal{O}}(\sqrt{T\ln T})$$ regret for RMAB with preference feedback. Experimental results further demonstrate the effectiveness of DOPL.
more » « less
Free, publicly-accessible full text available April 24, 2026
Decentralized Federated Learning with Model Caching on Mobile Agents

https://doi.org/10.1609/aaai.v39i20.35429

Wang, Xiaoyu; Xiong, Guojun; Cao, Houwei; Li, Jian; Liu, Yong (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Federated Learning (FL) trains a shared model using data and computation power on distributed agents coordinated by a central server. Decentralized FL (DFL) utilizes local model exchange and aggregation between agents to reduce the communication and computation overheads on the central server. However, when agents are mobile, the communication opportunity between agents can be sporadic, largely hindering the convergence and accuracy of DFL. In this paper, we propose Cached Decentralized Federated Learning (Cached-DFL) to investigate delay-tolerant model spreading and aggregation enabled by model caching on mobile agents. Each agent stores not only its own model, but also models of agents encountered in the recent past. When two agents meet, they exchange their own models as well as the cached models. Local model aggregation utilizes all models stored in the cache. We theoretically analyze the convergence of Cached-DFL,explicitly taking into account the model staleness introduced by caching. We design and compare different model caching algorithms for different DFL and mobility scenarios. We conduct detailed case studies in a vehicular network to systematically investigate the interplay between agent mobility, cache staleness, and model convergence. In our experiments, Cached-DFL converges quickly, and significantly outperforms DFL without caching.
more » « less
Free, publicly-accessible full text available April 11, 2026
Decentralized Federated Learning with Model Caching on Mobile Agents

Wang, Xiaoyu; Xiong, Guojun; Cao, Houwei; Li, Jian; Liu, Yong (February 2025, AAAI 2025)

Free, publicly-accessible full text available February 28, 2026
Decentralized Federated Learning with Model Caching on Mobile Agents

Wang, Xiaoyu; Xiong, Guojun; Cao, Houwei; Li, Jian Li; Liu, Yong (February 2025, Proceedings of Thirty-Ninth AAAI Conference on Artificial Intelligence)

Federated Learning (FL) aims to train a shared model using data and computation power on distributed agents coordinated by a central server. Decentralized FL (DFL) utilizes local model exchange and aggregation between agents to reduce the communication and computation overheads on the central server. However, when agents are mobile, the communication opportunity between agents can be sporadic, largely hindering the convergence and accuracy of DFL. In this paper, we study delay-tolerant model spreading and aggregation enabled by model caching on mobile agents. Each agent stores not only its own model, but also models of agents encountered in the recent past. When two agents meet, they exchange their own models as well as the cached models. Local model aggregation works on all models in the cache. We theoretically analyze the convergence of DFL with cached models, explicitly taking into account the model staleness introduced by caching. We design and compare different model caching algorithms for different DFL and mobility scenarios. We conduct detailed case studies in a vehicular network to systematically investigate the interplay between agent mobility, cache staleness, and model convergence. In our experiments, cached DFL converges quickly, and significantly outperforms DFL without caching.
more » « less
Free, publicly-accessible full text available February 27, 2026
Whittle Index-Based Q-Learning for Wireless Edge Caching With Linear Function Approximation

https://doi.org/10.1109/TNET.2024.3417351

Xiong, Guojun; Wang, Shufan; Li, Jian; Singh, Rahul (October 2024, IEEE/ACM Transactions on Networking)

Full Text Available
Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback

Xiong, Guojun; Li, Jian (July 2024, Proceedings of the 41st International Conference on Machine Learning (ICML), PMLR)

Full Text Available
Structured Reinforcement Learning for Delay-Optimal Data Transmission in Dense mmWave Networks

https://doi.org/10.1109/TWC.2024.3416437

Wang, Shufan; Xiong, Guojun; Zhang, Shichen; Zeng, Huacheng; Li, Jian; Panwar, Shivendra S (October 2024, IEEE Transactions on Wireless Communications)

Full Text Available
Online Restless Multi-Armed Bandits with Long-Term Fairness Constraints

https://doi.org/10.1609/aaai.v38i14.29489

Wang, Shufan; Xiong, Guojun; Li, Jian (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

Restless multi-armed bandits (RMAB) have been widely used to model sequential decision making problems with constraints. The decision maker (DM) aims to maximize the expected total reward over an infinite horizon under an “instantaneous activation constraint” that at most B arms can be activated at any decision epoch, where the state of each arm evolves stochastically according to a Markov decision process (MDP). However, this basic model fails to provide any fairness guarantee among arms. In this paper, we introduce RMAB-F, a new RMAB model with “long-term fairness constraints”, where the objective now is to maximize the longterm reward while a minimum long-term activation fraction for each arm must be satisfied. For the online RMAB-F setting (i.e., the underlying MDPs associated with each arm are unknown to the DM), we develop a novel reinforcement learning (RL) algorithm named Fair-UCRL. We prove that Fair-UCRL ensures probabilistic sublinear bounds on both the reward regret and the fairness violation regret. Compared with off-the-shelf RL methods, our Fair-UCRL is much more computationally efficient since it contains a novel exploitation that leverages a low-complexity index policy for making decisions. Experimental results further demonstrate the effectiveness of our Fair-UCRL.
more » « less
Full Text Available
DePRL: Achieving Linear Convergence Speedup in Personalized Decentralized Learning with Shared Representations

https://doi.org/10.1609/aaai.v38i14.29543

Xiong, Guojun; Yan, Gang; Wang, Shiqiang; Li, Jian (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

Decentralized learning has emerged as an alternative method to the popular parameter-server framework which suffers from high communication burden, single-point failure and scalability issues due to the need of a central server. However, most existing works focus on a single shared model for all workers regardless of the data heterogeneity problem, rendering the resulting model performing poorly on individual workers. In this work, we propose a novel personalized decentralized learning algorithm named DePRL via shared representations. Our algorithm relies on ideas from representation learning theory to learn a low-dimensional global representation collaboratively among all workers in a fully decentralized manner, as well as a user-specific low-dimensional local head leading to a personalized solution for each worker. We show that DePRL achieves, for the first time, a provable \textit{linear speedup for convergence} with general non-linear representations (i.e., the convergence rate is improved linearly with respect to the number of workers). Experimental results support our theoretical findings showing the superiority of our method in data heterogeneous environments.
more » « less
Full Text Available

« Prev Next »

Search for: All records